Intro to NGS processing

James A. Fellows Yates

2021-08-17

Who am I?

  • Education
    • B.Sc. Bioarchaeology (University of York, UK)
    • M.Sc. Naturwissenschaftliches Archäologie (University of Tübingen, DE)
    • Ph.D. Archaeogenetics (MPI-SHH / MPI-EVA, DE)
  • Experience
    • Number of genetics classes taken: 0
    • Number of bioinformatics classes taken: 0

@jfy133

Today we will

  1. Introduce what DNA sequencing is
  2. Explain how Illumina NGS sequencing data is generated
  3. How to evaluating NGS data [Practical]

Introduction DNA

What is DNA?

Deoxyribonucleic acid (/diːˈɒksɪˌraɪboʊnjuːˌkliːɪk, -ˌkleɪ-/ (DNA) is a molecule composed of two polynucleotide chains that coil around each other to form a double helix carrying genetic instructions for the development, functioning, growth and reproduction of all known organisms and many viruses. - Wikipedia

What is DNA?

Structure ADN

What is DNA?

Structure ADN

The rules

  • Four nucleotides
    • Pyrimidines: Cytosine, Thymine
    • Purines: Guanine Adenine &
  • Base pairing: one pyrimidine with one purine
    • C with G (think: CGI)
    • A to T (think: AT-AT walker)
  • Complementary
    • C on one strand, G on the other (or v.v.)
    • A on one strand, T on the other (or v.v.)

The rules

  • Make copy of a DNA strand with a polymerase
    • Unwind the DNA
    • Separate the strands
    • Make new strand: find a C, get new G (etc)

DNA replication split

How do we get DNA?

Figure 17 01 02

Introduction to DNA Sequencing

What is Sequencing?

Converting the chemical nucleotides of a DNA molecule

to

ACTG on your computer screen

Historically

  • Sanger sequencing

Sanger-sequencing

  • Separate strands, add primer (starting point)
  • Add mix of nucleotides, some with special ‘terminators’
  • Pass through size-filtering, read order of terminators

Pros and cons of Sanger Sequencing

  • Pros
    • More precise (less errors)
    • Longer reads
  • Cons
    • Resource heavy: lot of input DNA
    • Slow: one. fragment. at. a. time.

What is NGS?

  • NGS: Next Generation Sequencing
    • MASSIVELY multiplexed!
    • Sequence millions and even billions of DNA reads at once!

Not really ‘next’ anymore, consider it more ‘second’ generation (see: Nanopore)

What is NGS?

via Gfycat

What is NGS?

Market leader:

Illumina HiSeq 2500

(Others: Roche 454, PacBio, IonTorrent etc.)

How does it work?

Basically same concept, but with pretty pictures!

i.e. attach flouresent nucleotides, one colour per A C G T

A

G

T

C

Fire a lazer and take a picture!

But first!

Flow cell

Lawn